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Abstract 

Let  G  =  {V,  E)  he  a.  network  for  an  assignment  problem  with  2n  nodes  and  m 
edges,  in  which  the  largest  edge  cost  is  C.  Recently  the  class  of  instances  of 
bipartite  matching  problems  has  been  shown  to  be  in  RNC  provided  that  C  is 
O(log*  n)  for  some  fixed  k.  We  show  how  to  use  scaling  so  as  to  develop  an 
improved  parallel  algorithm  and  show  that  bipartite  matching  problems  are  in 
the  class  RNC  provided  that  C  =  ”)  for  some  fixed  k.  We  then  generalize 

these  results  to  minimum-cost  flow  problems.  Let  U  he  an  upper  bound  on  the 
capacities  of  the  edges  and  on  the  largest  demand.  We  show  that  the  minimum- 
cost  flow  problem  is  in  the  class  RNC,  provided  that  log(C  +  U)  =  O(log*  n) 
for  some  fixed  k.  Thus  the  minimum-cost  flow  problem  is  in  the  class  RNC 
even  when  the  magnitude  of  the  costs  and  capacities  are  allowed  to  grow  faster 
than  any  polynomial  in  n.  The  key  to  our  approach  is  to  reduce  the  number 
of  processors  needed  from  an  amount  that  is  proportional  to  the  magnitude  of 
the  largest  edge  cost  to  an  amount  that  is  independent  of  the  magnitude  of  the 
largest  edge  cost.  The  tradeoff  is  an  increase  in  the  running  time  that  grows 
linearly  in  log{C  U), 
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1  Introduction 


Karp,  UpfaJ,  and  Widgerson  [11],  and  Mulmuley,  Vazirani,  and  Vazirani  [14],  have  recently 
developed  randomized  NC  {RNCY  parallel  algorithms  for  the  minimum-cost  perfect  match¬ 
ing  problem,  eissuming  that  the  input  is  given  in  unary.  For  both  of  these  algorithms,  the 
number  of  processors  needed  is  proportional  to  the  magnitude  of  the  largest  number.  In 
this  note,  we  show  how  to  convert  these  algorithms  into  algorithms  that  use  a  number  of 
processors  that  is  independent  of  the  magnitude  of  the  largest  number,  provided  that  the 
graph  is  bipartite.  As  a  tradeoff,  we  get  an  increase  in  the  time  spent  that  is  proportional 
to  the  logarithm  of  the  magnitude  of  the  largest  edge  cost.  Let  n  be  the  number  of  nodes 
in  the  graph,  and  C  the  largest  cost  in  the  matching  problem.  If  C  =  we  get 

algorithms  that  do  less  work,  where  work  is  the  product  of  the  number  of  processors  used 
and  the  time  spent.  Assuming  that  C  =  for  some  constant  k,  our  algorithms  are 

in  RNC.  We  achieve  these  results  by  using  scaling,  which  reduces  the  problem  of  finding  a 
matching  in  a  graph  with  large  edge  costs  to  the  problem  of  finding  a  sequence  of  matchings 
in  a  sequence  of  graphs,  each  of  which  has  small  edge  costs.  The  algorithm  is  similar  to  the 
sequential  algorithms  of  [7]  and  [8],  as  it  iteratively  finds  an  assignment  and  dual  variables 
and  then  uses  these  to  ensure  that  during  the  next  iteration  the  graph  contains  a  matching 
of  small  total  cost.  By  iterating  this  algorithm  on  an  appropriately  derived  graph,  we  also 
obtain  improved  results  for  the  minimum-cost  flow  problem. 

Our  model  of  computation  is  a  parallel  random-access  machine  (PRAM)  [5].  We  define 
the  work  done  by  a  parallel  algorithm  as  the  product  of  the  number  of  processors  used  and 
the  time  spent. 

2  Preliminaries 

Let  G  =  {V,E,c)  be  an  undirected  bipartite  graph  with  node  set  V,  edge  set  E,  and  an 
integral  cost  c{v,  w)  associated  with  each  edge  {v,  w).  We  will  denote  the  two  sets  of  nodes 
in  the  bipartition  as  Vi  and  V2,  and  let  n  =  |Vi|  =  IF2I  and  m  =  liJj. 

A  matching  on  a  graph  is  a  set  M  of  edges,  such  that  each  node  is  incident  to  no  more 
than  one  edge  from  M.  A  perfect  matching  is  a  matching  in  which  every  node  is  incident 
to  exactly  one  matched  edge.  If  the  edges  have  costs,  the  cost  of  a  matching  is  the  sum  of 
the  costs  of  the  edges  in  the  matching.  A  minimum-cost  perfect  matching  (MCPM)  is  the 
perfect  matching  with  the  smallest  possible  cost.  In  a  bipartite  graph,  an  MCPM  is  also 

^  NC  is  the  class  of  algorithms  that,  on  input  of  size  n,  use  n***  processors  and  0(log*^  n)  time,  for  some 
constants  hi  and  kj.  RNC  algorithms  are  NC  algorithms  that  allow  each  processor  to  generate  an  O(logn) 
bit  random  number  at  each  step  in  the  computation  and  return  the  correct  answer  with  probability  greater 
than  1/2. 
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called  an  assignment.  It  will  be  convenient  to  associate  an  integer-valued  dual  variable  d{v) 
with  each  node  v.  This  allows  us  to  define  cj(v,w),  the  reduced  cost  of  edge  (v,w),  with 
respect  to  dual  variables  d  by  Cd{v,  w)  =  c{v,  w)  -f  d{v)  —  d{w).  Let  M  be  a  matching.  We 
say  that  a  set  of  dual  variables  is  tight  if 

Cd{v,w)  >0  y(v,w)eE  (1) 

Cdiv,w)  =0  'i{v,w)eM  (2) 

The  first  algorithm  for  the  assignment  problem  was  Kuhn’s  Hungarian  algorithm  [12]. 
Implemented  with  Fibonacci  heaps  [6],  this  algorithm  runs  in  0(nm -f  log  ra)  time,  which 
remains  the  fastest  strongly  polynomial  algorithm  for  the  assignment  problem.  Gabow  and 
Tarjan  [8]  have  developed  an  algorithm  that  runs  in  0{y/nmlog(nC))  time.  There  are  no 
known  NC  algorithms  for  the  assignment  problem;  however,  there  are  RNC  algorithms 
under  the  assumption  that  the  input  is  given  in  unary.  The  first  RNC  algorithm  under 
this  assumption  was  given  by  Karp,  Upfal,  and  Wigderson  [11].  An  implementation  of 
this  algorithm  by  Galil  and  Pan  [9]  uses  (n  +  C')M(n)  processors  and  O(logn  log^(nC")) 
time  where  C'  is  an  upper  bound  on  the  maximum  cost  of  any  matching,  and  M(n)  is 
the  minimum  number  of  processors  needed  to  multiply  two  n  x  n  matrices.  Currently 

M{n)  =  0(n^  [2],  and  trivially,  M{n)  =  fl(n^).  Subsequently,  a  faster  algorithm  was 

discovered  by  Mulmuley,  Vazirani,  and  Vazirani  [14],  that  finds  an  assignment  in  0{log^n) 
time  using  nmCM{n)  processors,  where  C  is  the  largest  edge  cost  in  the  input  graph.  As 
neither  one  of  these  algorithms  does  less  work  than  the  other  on  all  graphs,  we  will  give  our 
improvements  relative  to  both  of  these  algorithms. 

3  A  Scaling  Algorithm 

Our  algorithm  is  a  scaling  algorithm,  similar  in  general  structure  to  the  sequential  matching 
algorithm  of  Gabow  and  Tarjan  [8].  The  algorithm  proceeds  in  logC  iterations.  At  the 
beginning  of  each  iteration,  one  bit  is  added  to  the  costs  and  dual  variables.  Then  a 
perfect  matching  and  tight  dual  variables  are  found  on  the  graph  with  edges  of  reduced  cost 
no  greater  than  2n.  The  new  dual  variables  are  added  to  the  old  ones  and  the  iteration 
terminates.  The  details  of  the  algorithm  appear  in  Figure  1. 

The  algorithm  of  [8]  works  in  a  similar  manner,  except  that  in  their  algorithm  each 
iteration  involves  finding  only  an  approximate  matching.  At  each  step,  we  find  an  exact 
matching  and  tight  dual  variables.  By  working  with  reduced  costs  at  each  iteration,  we 
ensure  that  not  only  is  the  cost  of  the  new  matching  close  to  that  of  the  old  matching, 
but  also  that  the  magnitude  of  the  edge  costs  used  in  each  iteration  is  small.  Thus  we  can 
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Ill 


Input:  G  =  {V,E,c)  an  undirected  bipartite  graph  with  bipartition  and  V2  and  cost 
c(t;,  w)  on  edge  (v,  w).  Assume  that  G  contains  a  perfect  matching. 

Output:  A  minimum-cost  perfect  matching  M. 

1  d{v)  <—  0  Vu  G  V2. 

c{v,  u;)  <-  0  V(t;,  w)  €  E. 

Let  C  =  max(„,,t,)6£;{|c(t;,  u;)|}. 

2  For  /  =  1  to  flogj  C] 

3  d(v)  «—  2d{v)yv  €  V. 

c{v,  w)  <—  2c{v,  w)  +  (the  signed  bit  of  c(v,  w))  V(v,  w)  G  E. 

4  Let  E'  =  {(u,  w)  I  (v,  w)  £  E  and  c<<(u,  w)  <  2n}. 

5  Compute  M,  a  MCPM  in  G'  =  {V,  E\  c^). 

6  Compute  tight  dual  variables  for  G'  with  respect  to  matching  M  using  costs  cj.  Let 
A(i;)  be  the  dual  variable  associated  with  v. 

7  d{v)  *-  d{v)  +  A(u)  Vv  G  V. 

8  Output  M,  a  minimum-cost  perfect  matching. 

Figure  1:  Algorithm  ASSIGNMENT. 

reduce  the  assignment  problem  to  a  series  of  log  C  assignment  problems,  each  in  a  graph 
with  small  edge  costs. 

Before  proceeding  to  analyze  our  algorithm,  we  address  the  issue  of  finding  tight  duals. 
In  general,  given  an  optimal  flow  for  a  minimum-cost  flow  problem,  a  single  shortest-path 
computation  suffices  to  find  tight  dual  variables  (see,  for  example  [1]).  As  specialized  to 
the  assignment  problem  the  algorithm  is  as  follows.  First  create  a  directed  residual  network 
G'  =  iV',E')  such  that 

•  for  each  {v,w)  £  E,  there  is  an  edge  (w,  w)  G  E'  with  cost  c(v,  w), 

•  for  each  (u,  w)  £  M,  there  is  an  edge  {w,  v)  G  E'  with  cost  —c(v,  w), 

•  a  new  node  s  G  Vi,  and  for  each  node  u?  G  V2,  an  edge  (s,  w)  of  cost  nC. 

Now  let  d(v)  be  the  shortest  path  distance  from  s  to  aU  other  nodes.  It  is  well  known  that 
the  d(v)  that  result  are  tight  dual  variables  for  the  matching  problem. 

We  now  demonstrate  that  Algorithm  Assignment  described  in  Figure  1  is  correct  and 
efficient.  The  key  to  the  efficiency  of  this  algorithm  is  that  when  we  find  a  matching  in  step 
4,  we  may  ignore  edges  with  reduced  cost  greater  than  2n.  It  remains  to  be  shown  that  this 
does  not  change  the  value  of  the  MCPM. 

Lemma  3.1  The  graph  G  =  {V,E',Ci)  formed  in  step  4  of  algorithm  Assignment  always 
contains  a  MCPM  of  total  cost  no  more  than  n.  Further,  V(v,w)  £  E,  ca(v,w)  >  —1, 

Proof;  We  will  prove  this  by  induction  on  the  number  of  executions  of  step  4.  During  the 
first  execution,  all  reduced  edge  costs  are  either  -1,  0  or  1,  so  the  lemma  is  true.  Assume 
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that  it  is  true  after  step  4  on  iteration  /  -  1.  Then,  at  the  beginning  of  the  loop,  the  costs 
c<j  are  tight  with  respect  to  A.  Because  for  all  edges  {v,w) 

Cd(v,  w)  +  A{v)  -  A{w)  =  (c(v,  w)  +  {d{v)  +  A(t;))  -  {d{w)  +  A(w)) 

it  is  also  true  that  c  is  tight  with  respect  to  d  +  A. 

Thus,  in  the  graph  G  =  {V,E,c),  the  reduced  cost  of  the  edges  of  M  with  respect  to 
d  + A  is  0  and  the  reduced  cost  of  every  edge  is  non-negative.  Let  d  <—  d-H  A.  Now  consider 
the  effect  of  adding  a  new  bit  of  cost  and  updating  the  dual  variables  in  Step  3  of  iteration  /. 
After  multiplying  all  dual  variables  by  2  and  all  costs  by  2,  the  reduced  cost  of  each  edge  is 
stiU  non- negative  and  the  reduced  cost  of  each  edge  in  M  is  still  0.  After  adding  a  new- bit 
of  cost,  each  edge  has  reduced  cost  greater  than  or  equal  to  —1,  and  each  edge  in  M  has 
reduced  cost  at  most  1.  Therefore,  the  sum  of  the  cost  of  the  edges  in  M  is  at  most  n.  ■ 

Corollary  3.2  In  the  graph  G  =  {¥,£',  c^)  formed  in  step  4  of  algorithm  Assignment, 
there  always  exists  a  MCPM  M  in  which  c<j(u,  w)  <  2n  V(t;,  w)  €  M. 

Proof:  Assume,  to  the  contrary,  that  the  MCPM  contains  an  edge  of  value  greater  than  2n. 
By  Lemma  3.1  every  other  edge  in  the  matching  has  at  least  —1.  Therefore  this  matching 
has  value  greater  than  2n  -I-  (n  -  1)(-1)  >  n  -h  1.  But  by  Lemma  3.1,  we  know  that  there 
exists  a  perfect  matching  of  value  at  most  n,  therefore  the  one  we  have  cannot  be  the 
minimum  one.  ■ 

This  leads  to  our  main  result. 

Theorem  3.3  Let  algorithm  A  be  a  randomized  parallel  algorithm  for  MCPM  that  uses 
Cf{n,m)  processors  and  0(log‘n)  time,  where  f{n,m)  is  a  polynomial  in  n  and  m  and  k 
is  a  non-negative  integer.  Using  algorithm  Assignment  we  can  convert  algorithm  A  into  an 
algorithm  for  MCPM  that  uses  nf{n,m)  -f  Af(n)  processors  and  0((log*n  -f  log*  n)  log  C) 
time. 

Proof:  First  we  must  verify  that  our  algorithm  actually  finds  a  MCPM.  From  Corollary 
3.2,  we  see  that  ignoring  edges  of  reduced  cost  greater  than  2n  does  not  change  the  value 
of  the  MCPM.  Therefore,  at  each  step  we  find  a  valid  MCPM  with  respect  to  the  reduced 
costs.  Because  an  MCPM  with  respect  to  the  reduced  costs  has  the  same  value  as  an  MCPM 
with  respect  to  the  actual  costs,  in  the  last  iteration  we  really  are  finding  an  MCPM  in 
the  graph  where  the  current  edge  costs  are  the  same  as  the  edge  costs  of  the  input  graph, 
thus  proving  correctness.  To  derive  the  resource  bounds,  observe  that  whenever  we  find  a 
MCPM  in  step  5,  C  <  2n.  Procedure  Compute  Tight  Duals  is  dominated  by  the  shortest 
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path  computation  that  takes  O(log^  n)  time  on  M{n)  processors.  All  other  steps  in  the 
algorithm  can  be  implemented  in  constant  time  on  0(m  +  n)  processors.  Combining  these 
observations  with  the  fact  that  there  are  only  log  C  iterations  of  the  main  loop,  the  theorem 
follows.  ■ 

Corollary  3.4 

•  Algorithm  Assignment,  combined  with  the  matching  algorithm  of  [11],  yields  a  ran¬ 
domized  parallel  algorithm  for  computing  an  MCPM  using  n^M{n)  processors  and 
O(log^nlogC)  time. 

•  Algorithm  Assignment,  combined  with  the  matching  algorithm  of  [If],  yields  a  ran¬ 
domized  parallel  algorithm  for  computing  an  MCPM  using  n^mM{n)  processors  in 
C>(log^nlogC')  time. 

Proof:  Immediate  from  Theorem  3.3  and  the  algorithms  in  [11],  [9],  and  [14].  ■ 

Observe  that  our  algorithm  performs  less  work  in  the  case  that  C  =  for  some 

c  >  0.  Further,  our  algorithm  outperforms  the  old  algorithms  by  a  factor  of  so 

as  C  gets  larger,  our  algorithm  becomes  even  more  efficient  than  the  previous  algorithms. 

We  can  extend  this  algorithm  for  a  minimum-cost  perfect  matching  to  one  that  finds  a 
minimum-cost  (not  necessarily  perfect)  matching.  (We  allow  the  edge  costs  to  be  negative. 
Otherwise,  the  optimum  is  the  null  matching.)  Let  G  =  {V,E,c)  be  a  graph  in  which  we 
would  like  to  find  a  minimum-cost  matching.  We  employ  the  standard  transformation  of 
using  an  augmented  graph  G'  =  {V,Vx  X  V2,c')  where  c'(u,ti;)  =  c{v,w)  if  (u,Tn)  €  E  and 
c'{v,  w)  =  0  otherwise.  It  is  easy  to  see  that  if  M'  is  a  minimum-cost  perfect  matching  in 
G',  then  M'  U  £  is  a  to  a  minimum-cost  matching  in  G. 

Corollary  3.5  Given  a  graph  G  with  maximum  edge  cost  C,  running  algorithm  Assignment 
on  G'  yields  an  algorithm  that  finds  a  minimum-cost  matching  using  n^M(n)  processors  and 
©(log^nlogC)  time  orn*M{n)  processors  and  0(log*nlogC)  time. 

4  Minimum  Cost  Flow  Problems 

Our  technique  also  yields  improved  results  for  the  minimum-cost  flow  problem.  We  assume 
familiarity  with  the  minimum-cost  flow  problem  as  we  will  only  sketch  the  ideas  needed  to 
extend  the  class  of  minimum-cost  flow  instances  that  are  in  RNC.  We  refer  the  reader  to 
[1,  13,  10]  for  more  detail  on  minimum-cost  flows. 


6 


The  first  idea  is  to  show  that  a  general  minimum-cost  flow  problem  can  be  reduced  to  a 
series  of  minimum-cost  flow  problems,  each  with  small  edge  capacities.  This  fact  is  implicit 
in  the  algorithm  of  Edmonds  and  Karp  [4],  here  we  make  it  explicit. 

Lemma  4.1  Let  P  be  a  minimum-cost  flow  problem  with  n  nodes,  m  edges,  maximum  edge 
cost  C  and  maximum  edge  capacity  U .  Then  P  can  be,  in  NC,  reduced  to  the  solution  of 
0{\ogU)  minimum-cost  circulation  problems,  each  with  n  nodes,  m  edges,  maximum  edge 
cost  C  and  maximum  edge  capacity  m,  and  total  supply  m. 

Proof:  As  in  [4],  we  will  introduce  the  capacities  one  bit  at  a  time.  We  will  now  describe 
how  to  convert  a  minimum-cost  flow  with  respect  to  the  first  I  bits  of  the  capacities  to  one 
with  respect  to  the  first  f  -j-  1  bits.  Given  a  flow  /  that  is  miniinum-cost  with  respect  to 
capacities  w,  we  first  double  the  flow  and  double  the  capacities.  This  still  gives  a  minimum- 
cost  flow.  Next  we  introduce  a  new  bit  of  capacity.  The  new  flow  may  no  longer  be 
minimum-cost,  but  as  is  discussed  in  [4]  it  is  “close”  to  a  minimum-cost  flow.  We  now 
bring  the  edges  “in-kilter”  by  saturating  residual  edges  with  negative  reduced-cost.  This 
introduces  supplies  and  demands  at  nodes,  but  since  we  have  added  at  most  one  unit  of 
capacity  to  each  edge,  the  sum  of  aU  the  supplies  is  most  m.  Further,  since  no  edge  capacity 
need  be  more  than  the  sum  of  supplies,  we  can  limit  all  edge  capacities  to  m  also.  So  we 
now  have  a  minimum-cost  circulation  problem  with  n  nodes,  m  edges,  maximum  edge  cost 
C,  maximum  edge  capacity  m,  and  total  supply  m.  We  discuss  below  how  to  solve  such 
a  problem.  Since  the  initial  capacities  are  at  most  U,  we  need  repeat  this  process  at  most 
|log  t/"j  times.  ■ 

We  are  now  left  with  the  problem  of  solving  a  minimum-cost  circulation  problem  with 
n  nodes,  m  edges,  maximum  edge  cost  C  and  maximum  edge  capacity  m,  and  total  supply 
m.  Using  a  standard  transformation,  (see,  for  example  [1],  [15]),  we  can  convert  this  capac¬ 
itated  flow  problem  into  an  uncapacitated  transportation  problem  with  m  -f  »  nodes,  2m 
edges,  maximum  demand  m  and  maximum  cost  C.  We  now  use  another  standard  transfor¬ 
mation  (see,  for  example  [13])  that  converts  an  uncapacitated  transportation  problem  into 
an  assignment  problem.  If  node  v  has  demand  d,  we  make  d(v)  copies  of  that  node.  For 
each  edge  (u,  w)  we  put  an  edge  between  every  copy  of  v  and  every  copy  of  w.  This  creates 
an  assignment  problem  with  0(m*)  nodes,  0(m*)  edges,  and  maximum  edge  cost  C .  We 
now  apply  algorithm  ASSIGNMENT  to  solve  this  problem. 

Theorem  4.2  We  can  solve  a  minimum-cost  flow  problem  using  m*M{m^)  processors  and 
0(log^nlogC'log  U)  time  or  using  m^M(m^)  processors  and  0{log^  nlogC log  U)  time. 

When  C  =  ft(m*),  this  compares  favorably  with  the  previous  results  of  Cm^M{m^) 
processors  and  0(log®nlogU)  time  or  Gm®M(m’)  processors  and  0(log*  nlogU)  time 
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5  Conclusion 


We  have  given  an  algorithm  for  the  assignment  problem  that  performs  less  work  than  the 
previously  known  RNC  algorithms.  It  has  the  appealing  feature  of  having  the  number  of 
processors  be  independent  of  the  size  of  the  numbers.  We  have  also  extended  the  class 
of  assignment  problems  that  can  be  solved  in  RNC .  Previously,  it  was  required  that 
C  =  0(n*)  for  some  constant  k.  With  these  results,  problems  with  C  =  for 

some  constant  k,  are  now  in  RNC.  We  have  also  extended  the  class  of  minimum  cost  flow 
problems  that  are  in  the  class  RNC. 

In  contrast  with  previous  algorithms,  the  matching  algorithm  only  works  for  bipartite 
graphs.  This  is  because  the  problem  of  finding  tight  dual  variables  in  general  graphs  appears 
to  be  no  easier  than  actually  finding  a  matching,  even  sequentially  [3]. 
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